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Abstract — This paper proposes an on-line two phase fault 
diagnosis algorithm for arbitrary connected networks. The 
algorithm addresses a realistic fault model considering crash 
and value faults in the nodes. Fault diagnosis is achieved by 
comparing the heartbeat message generated by neighboring 
nodes and dissemination of decision made at each node. 
Theoretical analysis shows that time and message complexity 
of the diagnosis scheme is O(n) for a n-node network. The 
message and time complexity are comparable to the existing 
state of art approaches and thus well suited for design of 
different fault tolerant wireless communication networks.. 

Index Terms — On-line diagnosis, two phase diagnosis, value 
faults, dynamic fault environment. 

I. Introduction 

The distributed arbitrary connected networks such as 
mobile ad hoc network and sensor network are becoming 
popular due to their extensive use in social, commercial and 
scientific applications. These networks maybe deployed in 
unattended and possibly hostile environments. The hostile 
environment affects the monitoring infrastructure and nodes 
become more susceptible to component failures. 
Incorporating correct and timely fault diagnosis capability to 
the system with less overhead is essential to improve the 
system reliability and availability. An important element for 
the timeliness of online diagnosis is the ability to execute 
diagnostic tests without interrupting system operation, that 
is, without explicit testing capabilities. A well-known solution 
is the comparison approach, where multiple nodes execute 
the same task, and the outcomes are compared by other nodes 
[1][2]. The agreements and the disagreements among the 
nodes are the basis for identifying the faults. This paper 
follows this diagnosis approach where heartbeat messages 
are broadcasted periodically. In distributed self-diagnosis, 
every node in the network needs to record the status of all 
other nodes. 

Motivated by the need a two-phase on-line distributed 
diagnosis approach for arbitrary connected networks is 
proposed. A synchronous system model is chosen for 
simplicity of presentation where a distributed system 
framework by using a round-based (synchronous) message 
dispersal protocol is considered. The diagnostic latency and 
message complexity is used as the performance measure in 
order to evaluate the proposed fault diagnosis algorithm. A 
typical scalar wireless sensor network is considered as an 
arbitrary network and the performance of the proposed 
algorithm is evaluated by simulation. 
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The specific contributions of this paper are listed as follows: 

1. Proposes a generic diagnosis scheme that identifies crash 
and value faults with high accuracy by maintaining low time, 
message and energy overhead. 

2. Presents both analytical and simulation analysis to prove 
the correctness and completeness of the algorithm. 

II. Related Works 

System-level fault diagnosis was introduced by Preparata, 
Metze and Chien in 1967 [3], as a technique intended to 
diagnose faults in a wired inter connected system. Previously 
developed distributed diagnosis algorithms were designed 
for wired networks [1^1] and hence not well suited for wireless 
networks. The problem of fault detection and diagnosis in 
wireless networks is extensively studied in literatures [5-11]. 
The problem of identifying faulty nodes (crashed) in WSN 
has been studied in [5]. This article proposes the WINdiag 
diagnosis protocol which creates a spanning tree (ST) for 
dissemination of diagnostic information. Thomas et al. [6] 
have investigated the problem of target detection by a sensor 
network deployed in a region to be monitored. The 
performance comparison was performed both in the presence 
and in the absence of faulty nodes. Elhadef et al. have 
proposed a distributed fault identification protocol called 
Dynamic-DSDP for MANETs which uses a ST and a gossip 
style dissemination strategy [7]. In [8], a localized fault 
diagnosis algorithm for WSN is proposed that executes in 
tree-like networks. The approach proposed is based on local 
comparisons of sensed data and dissemination of the test 
results to the remaining sensors. In [9] the authors present a 
distributed fault detection algorithm for wireless sensor 
networks where each sensor node identifies its own state 
based on local comparisons of sensed data against some 
thresholds and dissemination of the test results. The fault 
detection accuracy of a detection algorithm would decrease 
rapidly when the number of neighbour nodes to be diagnosed 
is small and the nodes failure ratio is high. Krishnamachari et 
al. have presented a Bayesian fault recognition algorithm to 
solve the fault-event disambiguation problem in sensor 
networks [10]. 

III.SYSTEM AND FAULT MODEL 

A. System Model 

The communication network is assumed to be error-free, and 
deliver messages reliably. We consider a round-based 
communication model, which implies that periodically, i.e., at 
the period boundaries, messages are sent by system nodes. 
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The system under consideration accommodates n number of 
nodes. Each node occupies a position (x, y) inside of a fixed 
geographic area (Ixl m 2 ) and are initially uniformly distributed. 
Two nodes v and v are within transmission range R , if the 
Euclidean distance d(v. ,v.) is less than R n . The topology 
graph G = (V,E) consists of a set of vertices Vrepresenting 
the nodes of the network and the set E of undirected edges 
corresponding to communication links between nodes. Each 
node in the network maintains a neighbor table N(.) which 
stores IDs of 1-hop neighbors. All nodes execute the same 
workload (For example temperature sensing from the 
environment) and determine the output value x.. This value 
is communicated to all other nodes. An arbitrary network 
with connectivity k has been assumed. Every node is assigned 
with a node-ID, and can detect the absence or time deviance 
for an expected message. 

B. Fault Model 

We consider crash and value faults in nodes. Links are 
assumed to be fault free. A crash-faulty node is unable to 
communicate with the rest of the network, whereas a node 
with value fault continues to operate and communicate with 
unpredicted behavior. These malfunctioning (value faulty) 
sensors could participate in the network activities since still 
they are capable of routing information. 

C. Time Synchronization 

The proposed algorithm needs to synchronize since sensor 
readings at diagnosis interval are exchanged to establish a 
protocol for correct and complete diagnosis. One of the key 
lightweight time synchronization in WSNs is Timing-sync 
Protocol for Sensor Networks (TPSN)[12]. TPSN generates 
time synchronization with periodic time synchronization 
messages. TPSN maintains a global time in the network by 
organizing the system into levels. Level discovery is 
performed at the initial time when the network is deployed. 
The sink is the root of the network. It is assigned a level 0. A 
node at lower level accepts the time sync packets from nodes 
in the upper level and drops all other time sync packets from 
its lower level and the peers in the same level. Finally the 
whole WSN will follow the clock of the sink. 
This work has modified the original TPSN for diagnosis 
settings. This work uses UDG-NNT algorithm [ 1 3] to construct 
a ST where each node is assigned a rank. The sink node has 
the highest rank in the network. Each node v., except sink 
node, selects the nearest node v. among its neighbor nodes 
such that rank(v.) < rank(v.) and sends a connect message 
to v to inform that (v., v.) an edge in the ST. This work 
introduces a level maintenance phase which ensures a 
connected ST. Therefore, creating and maintaining a 
hierarchical structure should not be considered as an 
overhead exclusive to the diagnosis algorithm. 

IV. THE ALGORITHM 

This work considers two fault categories: 1) The set of missing 
messages, are those messages which node v. believes node v. 
failed to issue and 2) The set of improper logical messages, 
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are those messages which are correctly delivered but disagree 
with x., the result of v.'s own voting process on messages 
received. A formal description of the detection algorithm is 
presented in Algorithm 1 . 
Algorithm 1 : Detection algorithm 
First phase 

1. Broadcast the test message 

2. Set timer for T 

out 

3.ITT = true then 

out 

4. Detect unreported nodes as hard faulty. 

5. Obtain the sensor readings of all neighbours. 

6. If v. agrees with v. (v. is element of Af(v.)). 

7. Detect v fault free. 

8. Repeat step 6 and 7 for all v.,j=l,.., I N(v.)\ and generate 
local fault table. 

9. Broadcast this local fault table. 
Second phase 

10. Upon receiving the local fault tables from 1-hop neighbors 
v. compares the local fault tables. 

1 1 . If more than half of the reported nodes mark v faulty then 
v. finally detect v. faulty. 

Out of the two phases of the algorithm, during the first phase 
each node periodically execute the diagnostic work load and 
initiates the results by a round of message transmissions to 
all other nodes. A node detects a crash or missing message 
fault without receiving a test message before Tout. If the 
message delivery and its arrival at a receiving node is valid 
but incorrect (i.e., readings does not match with its own read- 
ing), the message is recorded as improper logical message 
and the node is value faulty. This phase of diagnosis we call 
local diagnosis phase. In this phase a node identifies 1-hop 
neighbor node's validity by comparing its own message with 
that of received message.The comparison need not seek for 
an exact value in the message rather can choose to consider 
range or deviance check. If the received message is well within 
the range of its own value, it accepts as a correct message 
otherwise records as incorrect message. An adversary node 
may also send an erroneous message in its header, which 
may not be detectable during this phase. We show that, these 
faults are detected by the second phase of diagnosis. In the 
second phase these local results available at each node are 
further exchanged with other nodes and a counter main- 
tained at every node is incremented by one for every posi- 
tive diagnosis. If the counter value at a particular node for 
another node is greater than half of the nodes, it means that 
more number of nodes detected that node as faulty and all 
othernodes that recorded this event as fault free is accused 
as faulty. If the accusation against a node is recorded as 
faulty in the previous round, this node is considered as faulty 
in the current round. Both the phases of two-phase diagno- 
sis procedure are executed in a pipelined manner to improve 
diagnostic latency. 

The primary fault table of a node v., FTJv.), represents the 
union of test outcomes due to improper logical message 
andmissing message in first phase. The table entry 
corresponding to any node v / N(v.) is a binary input: 

ACEEE 



ACEEE Int. J. on Network Security , Vol. 03, No. 01, Jan 2012 



corresponds to a fault-free input received from v. as perceived 
by v., and 1 represents a fault being perceived by v.. In the 
second phase this work defines a function fjv.) = I U FT (v.)\ 
where v. IN(v.). This function is used to count the number of 
accusations on a processor v. by all other. Thus/"(V) is an 
integer where £f(Vj) £ (n-1 ). 

The local diagnostic views are disseminated to obtain a global 
diagnostic view of the network. Once ST maintenance is 
completed the leaf nodes in ST start dissemination phase by 
sending their local diagnostic view to their parent. Once sink 
node has the global diagnosis view the synchronization phase 
is triggered and the global view is embedded in the time sync 
packet of sink node. Thus, at the end of synchronization 
phase all nodes in the network have the global view of the 
network. 

IV. Basic Analysis of Algorithm 

The formal analysis of algorithm involves satisfying the two 
important properties as follows: 

Correctness: every node diagnosed to be faulty by a non- 
faulty node is indeed faulty. 




(a) (b) 
Figl. Example to show correctness of the algorithm 

Completeness: Every faulty node is identified. 
First, we consider correctness, which states that if a good 
processor accuses some other processor, the accused 
processor is indeed faulty. 

Theoreml. (Correctness). If a node vi is faulty, then all fault 
free nodes diagnose vi as faulty. 

Proof. The only situation in the algorithm that a good node vi 
could declare another node faulty when fvi(vj) 3 [IN(v ) 

N(v.)l/2]. For easy understanding of the proof we consider 
an example shown in Fig.l. Let node 6 represents vi and 
node-7 represents vj. Figl. a assumes all neighbor nodes of 
node-6 and node-7 are fault free. These two nodes share 
node- 1 1 and node- 12 as their common neighbors. Here node- 
6 correctly detects node-7 as fault free since/j(7) > [IN(6) ) 

Pl (7)1/2] .In scenario as depicted in Fig. 1 .b node-6 receives 
positive remarks only from node-7 and thus f 17) < [IN(6) )" 
N(7)l/2]. Thus node-6 incorrectly detects node-7 as 
faulty. However, node-1 detects node-7 as fault free since / (7) 
3 [elN(l) )" N(7)l/2] . In the dissemination phase each node 
sends its local diagnostics to the node in upper level. Thus 
the incorrect decisions taken by nodes are taken care by the 
nodes at the higher level and finally the diagnostic information 
in sink node at the end of local dissemination contains the 
exact set of fault set. 
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The upper bound time complexity is expressed in terms an 
upper bound on the time (Tp) needed to propagate a message 
between sensor nodes. 

Theorem2. (Completeness). The diagnosis algorithm 



terminates before a bounded delay T 
1)T+2T +T 

' n /nit nmi r rvmo 



omple. 



--(2n- 



Proof. The detection phase takes at most 2T +T time 

r out processes 

in obtaining the local diagnostic view. T is the time 

processes 

taken by nodes to process the diagnosis massage. In ST 
maintenance phase, the node with faulty parent needs at 
most 3T time to get connected with ST. In at most d T , the 

p ° si p 

sink node obtains the global diagnostic view of the network 
where d is the depth of ST. The sink node disseminates this 
view that reaches the farthest node in at most d T . In worst 
case d = n " 1 . Now, the upper bound time complexity can 
be expressed as 
T , . =(2n-l)T+2T +T 

complexly P out processing 

Theorem3. The proposed algorithm has a worst-case message 
exchange complexity O(n) in the network. 

Proof: In the first phase each node sends the diagnostic 
message to its neighbors, costing one message per node i.e. 
n messages in the network. Similarly, in the second phase n 
number of diagnostic messages is exchanged. 
Building the ST with sink as root costs at most 2n message 
exchange. Each node, excluding the sink, sends one local 
diagnostic message. Each node, excluding the leaf node, sends 
one global diagnostic message and in worst case depth of 
ST is n-1 . Thus, message cost for disseminating diagnostic 
messages is 2(n-l). So, the total number of exchanged 
messages is 
M =6n-2=0(n) 

cost v ' 

V. Simulation Results 

The performance of the proposed scheme via simulations is 
presented in this section. This work uses OMNET++ as the 
simulation tool where all simulations are conducted on 
networks using the IEEE 802. 15.4 at the MAC layer. The set 
of simulation parameters are summarized in Table 1 . 

TABLE 1. SIMULATION PRAMETERS 



PirLnieter 


Value 


Nunit as: nodes 


100-1000 


Network 2nd 


From (0,0) to (1000, 1000) 


Sink 


AtC75,150) 


Simulation time 


300 Se: 


Propagations oh enie 


Two Rsv Ground 


Antenna roheuie 


Oum: direotional 



12 



Fig. 2 shows the communication complexity of the proposed 
protocol. From the simulation result it is evident that the 
communication complexity of this work outperforms 
thepresent state of art schemes. Energy consumption by each 
node is proportional to the amount of traffic it generates or 
receives. Thus, the energy overhead of the proposed scheme 
is less which in turn improves the network lifetime of a WSN. 
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Fig2. Message complexity ol proposed algorithm 
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Fig3. Time complexity of proposed algorithm 

Fig. 3 demonstrates the time complexity of the proposed 
scheme. From Theorem 2 it is obvious that dissemination of 
diagnostics contributes more to diagnosis latency. The depth 
of the ST decides the diagnosis latency, as it is used to 
disseminate diagnostics. Thus, as expected the time required 
to diagnose the WSN increases almost linearly with increase 
of number of nodes. 

Conclusions 

This paper presents a diagnosis scheme to address the 
fundamental problem of identifying faulty (value and 
crash)nodes in a arbitrary connected network. The proposed 
work assumes that at most a number of nodes are faulty at 
any time t where a is connectivity of the network. However, if 
more than a number of nodes are faulty then detection 
accuracy in obtaining local view is less affected. 



The global view is severely affected since the network gets 
partitioned. The message and time complexity of the proposed 
model is O(m) which is significantly low compared to present 
state of art approaches. Due to low message and time 
complexity the model could be integrated to fault tolerant 
systems. 
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